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As massive tunneling projects become more and more 
popular, predicting the performance of Tunnel Boring 
Machine (TBM) has been a problem that arose recently. A 
TBM is a modern piece of machinery that is_ specially 
assembled to excavate a tunnel more efficiently and safely. 
However, the performance of TBM is very difficult to 
estimate due to the different geological formations and 
geotechnical factors. This research aims to predict the 
penetration rate (PR) of TBM _ utilizing statistical and 
artificial intelligence methods that are based on the rock 
mass and rock material properties: rock mass rating, rock 
quality designation, and rock strength. To achieve this goal, 
we used two neural network-based models: artificial neural 
network (ANN) and = group method of data _ handling 
(GMDH), to forecast the TBM PR _ values. Then, we 
compared the performance of these two models using the 
well-known indices and a ranking system and selected the 
model with the highest degree of performance. As a result, 
an ANN model with one hidden layer and seven neurons 
showed the highest level of capability in predicting TBM PR. 
Correlation coefficient values of 0.947 and 0.921 for the 
training and testing phases, respectively, were obtained for 
the best model in this study. Our research can serve as a 
fundamental study for future geotechnical engineers or 
researchers who would like to predict TBM _ performance 
with similar rock mass and material properties to this study. 
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1. Introduction 


Mechanized tunneling encompasses any tunneling methods that use mechanical excavation tools 
such as teeth, picks, or discs. Using tunnel boring machines (TBMs) to excavate and build 
support structures is the best example of mechanized tunneling. However, shaft builders (vertical 
boring machines) and boom-type roadheaders are also employed for mechanized tunneling. 
Using a TBM can reduce the amount of caving in a tunnel, reduce disturbance to the surrounding 
area, and require fewer workers compared to conventional methods [1,2]. Therefore, accurate 
prediction of TBM performance can significantly reduce the likelihood associated with the high 
capital cost and tunneling excavation time or schedule [3]. During the past decades, many 
researchers have employed different techniques to predict TBM performance. These techniques 
mainly involve three categories, i.e., 1) theoretical, 2) empirical or statistical, and 3) artificial 
intelligence (AI). 


In terms of theoretical analysis, different approaches have been introduced to solve problems of 
TBM performance (e.g., [4-8]). In general, the theoretical approaches achieve predicting PR by 
analyzing the cutting forces acting on disc cutters with the help of force equilibrium equations 
(Laboratory cutting tests). Since TBM disc cutters confront rock mass conditions in the field, 
theoretical models are hampered by the absence of this information. In addition, the equipment 
needed for these experiments may not be accessible at every research facility around the globe 
[9]. It may be necessary to make modifications to TBM performance data if such equipment is 
not available to accurately estimate the TBM's performance in the absence of such equipment. 


As for the empirical or statistical models, they predominantly use predictors (inputs) to forecast 
the TBM performance (output) using mathematical relationships. A linear multiple regression 
(LMR) equation was introduced in research by Hamidi et al. [10] to forecast the boreability of 
TBM in 8.5 km of Zagros long tunnel in which the construction was taking place in sedimentary 
rock. In another regression analysis, Hassanpour et al. [11] proposed a multiple regression 
analysis to establish the relationship between the field penetration index (FPI) and geological 
parameters. The study was based on the Manapouri tunnel in New Zealand and three other 
projects in Iran. Yagiz [12] performed a series of simple regressions to find the best correlation 
parameters for FPI. Jing et al. [13] developed a statistical model to correlate the penetration rate 
(PR) with rock mass parameters. The study was performed based on the data from the Songhua 
River Water Supply Project in China where 7.3 km of a tunnel in a limestone region was 
excavated using TBM. Although statistical models are known for their simplicity and efficiency, 
their performance capacities are low, especially when encountering extreme values in the data 
[14]. They also do not demonstrate the capability and robustness to solve non-linear and complex 
relationships [15]. 


The last group of models available for TBM performance prediction is AI and machine learning 
(ML) [16-21]. Many researchers have proved that these techniques can effectively solve science 
and engineering problems [22-26]. For example, Salimi et al. [27] developed models for 
predicting the FPI of TBM using genetic programming (GP) and classification and regression 
tree (CART) techniques. The model generated by CART demonstrated a higher superiority over 
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the GP model with a coefficient of determination (R’) = 0.91 over 0.86. Mahdevari et al. [28] 
proposed a support vector regression (SVR) model to predict the PR of TBM. The study was 
conducted on 150 datasets from the Queens Tunnel using rock mass, rock material, and machine 
properties. The results of the SVR showed that this AI technique can provide a high model 
capability (R? = 0.949). In another investigation, a gene expression programming equation was 
suggested by Armaghani et al. [15] for solving TBM performance. They reported an easy-to-use 
equation as well as an acceptable performance for predicting TBM PR values. Yagiz et al. [29] 
conducted a study to propose a PR prediction model using an artificial neural network (ANN) 
methodology. Yagiz and Karahan [30] performed another study with good results to estimate the 
TBM performance using a particle swarm optimization system. Yang et al. proposed a hybrid 
tree-based technique: the improved sparrow search algorithm-gradient boosting regression tree, 
to solve the problem of predicting TBM PR. The results showed their model performed well in 
this topic [31]. Overall, AI and ML models have an acceptable capacity in predicting TBM 
performance. 


In this study, our idea is to examine the effects of rock properties (mass and material) on TBM 
performance, specifically PR values. Previous studies typically consider a component related to 
TBM machines, such as revolutions per minute or cutter force, as an input parameter. However, 
since the idea is to predict TBM performance, of course, before the construction process, there is 
no available TBM data. The only information available is from the site investigation phase, 
which includes rock or soil properties or a combination of the two [32]. Using these two 
available variables to predict PR, before the site investigation phase, is more reasonable and 
practical in tunnel construction. Therefore, we developed two predictive models: the group 
method of data handling (GMDH) and ANN, to evaluate and predict the performance of TBM. 
The most effective parameters, as predictors to forecast TBM PR, are rock mass and rock 
material. The developed models will be compared in terms of their performance indices and 
powers of prediction, and eventually, the best intelligent predictive model will be determined and 
used in the area of TBM performance prediction. 


The rest of the paper is organized as follows: 


Section 2 introduces the concept and principle of the machine learning models; Section 3 
illustrates the engineering background and data source; Section 4 describes the process of 
modeling; Section 5 discusses the results of modeling; Section 6 explains the future work; 
Section 7 summarizes the main conclusions of the paper. 


2. Model background 


2.1. Artificial neural network (ANN) 


An Artificial Neural Network (ANN) is an AI system that was designed to mimic certain 
characteristics of the human nervous system [33]. Unlike traditional AI models, ANNs can learn 
patterns and relationships from training data and can process information in a manner similar to 
the way the human brain does [34]. An ANN consists of artificial neurons, which receive signals 
and pass them through an activation function to produce an output [35]. These outputs are then 
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used as inputs for subsequent neurons. ANNs can be trained repeatedly to improve their 
performance and during this process, the architecture and connection weights of the network are 
modified iteratively to reduce errors in the predicted data [36]. The net weighted input (net;) for 
each node (e.g., neuron /) will be calculated by (Fig. 1): 


net; = Tha xwy” (1) 
where the two parameters x; and wy indicate each input signal and its corresponding weight, 
respectively, while n is the number of i incoming signals that transmit to the processing neuron /. 
The threshold applied to neuron 7 is represented by the parameter 7. An activation function is 
applied to this net input. This process is known as the "training approach". Subsequently, the 
output is compared with the actual value, and the resulting error passes back across the network, 
allowing the individual weights to be fine-tuned. 


XxX; X> X3 Kg se Xn Inputs 


Ld tee z= . 
SAW | 


is input net ; 


6, threshold ——————_» © Activation function 


0; Activation 


Fig. 1. A typical neuron's architecture in ANN. 


2.2. Group method of data handling (GMDH) 


GMDH is a combination of a few algorithms to forecast the function or relationship of several 
predictors and to solve problems [37,38]. It is a neural network algorithm that allows the 
software to learn the relationship between the predictors and the desired output. The GMDH 
algorithm allows the analysis of several mathematical equations, i.e., polynomial, non-linear, and 
probabilistic, to discover the most ideal model for prediction purposes [39]. In GMDH, the 
system is run by layers of neurons, and the number of neurons in a layer is defined by the 
number of inputs inserted into the system. To elucidate, the number of neurons is denoted by x, 
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and since the system considers all pairwise combinations for input variables, the number of 
neurons is equal to G) [40]. The GMDH model can be constructed to forecast the output value 
based on any inputs are given (Fig. 2): 


Y = f (Xin, Xi2, Xi3, +) Xiq) (i = 1, 2, 3, ee q) (2) 


where Y is the output, f is a function, x is an input vector, and i is the number of observations 
(from | to q" observation). More details related to GMDH, its effective parameters, and the 
modeling process can be found elsewhere (e.g., [41]). 
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Fig. 2. AGMDH model with four input variables. 


2.3. Performance indices 


To compare the accuracy of AI models, several performance indices are normally used to assess 
the precision of the models. Hence, five performance indices, including R’, root mean square 
error (RMSE), the variance accounted for (VAF, %), mean absolute error (MAE), and a20-index, 
are selected. R’ is one of the performance indices to describe the global fit of the model. R’ is the 
proportion of variance of the output that is forecasted by using predictors in the model. The ideal 
value of R? would be 1. The RMSE is a standard deviation of estimation error at which it 
evaluates how much the error spreads from its original best fit model. The ideal RMSE would be 
0, meaning that all the predicted values are the same as the observed values. The VAF is 
described as how much variability in the data can be expressed by a model. The VAF is 
expressed in percentages, where a higher percentage indicates a better accuracy of the model. 
The MAE is the average of errors between predictions and observations over a number of 
datasets. The a20-index is a performance index that is similar to the MAE where it measures the 
deviation of a predicted value from an actual value. The ideal value of the a20-index is 1. The 
formulas for the mentioned indices are presented as follows: 
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where y;, Y; and y are measured, predicted, and mean values of TBM performance, respectively, 
n is the number of data samples, and m20 is the number of datasets that have a value of 0.8 to 1.2 
for the ratio of measured to predicted values. 


3. Study area and data samples 


Selangor is one of the states in Malaysia that has the highest population density in the country. 
As such, there exists a large demand for water supply to support the residents in the area. The 
Pahang Selangor Row Water Transfer (PSRWT) tunnel project aims to divert water supply from 
Pahang to Selangor through a tunnel excavated using three TBMs. The objective of this project is 
to transfer 1890 million liters of water diverted from the Sematan River in Pahang to the South 
Klang Valley region in Selangor. The flow of the Sematan River is extracted from the reservoir 
by the pumping station next to the intake and connected to a connecting basin at the tunnel inlet 
via a pipe. The connecting basin diverts the raw water to the outlet connecting basin with the aid 
of gravity flow. 


PSRWT tunnel project is located in a Main Range Granite region, where quartz dykes with clay 
were observed in the region. Quartz veins were also noticed at the half bottom of the tunnel. The 
route of the PSRWT tunnel is broken up into four sections based on the type of rock in each area. 
Section 1, from Ch. 0.8 km to Ch. 3.8 km, is made up of Devonian sedimentary rocks that have 
been slightly metamorphosed. It is mostly black shale to schist, but it has been folded a lot 
because granitic rocks moved in during the Triassic period. This section has the most overburden 
(cover) at 240 m. Section 2, from Ch. 3.8 km to Ch. 12.5 km, is made up of coarse-grained 
granitic rocks that the overburden is between 33 m and 483 m. Section 3, from Ch. 12.5 km and 
Ch. 27.0 km, the tunnel route is made up of coarse to medium-grained granitic rocks. This 
section has the most overburden at 1390 m. This is where the last part of the PSRWT tunnel, 
which is made of weathered granite from the Main Range, is found. It's between Ch. 27.0 km and 
Ch. 44.6 km. 


The tunnel runs through six major faults: Karak (Ch. 2.5 km), Krau (Ch. 12.45 km), Bukit Tinggi 
(Ch. 19.15 km), Lepoh (Ch. 28.6 km), Kongkoi (Ch. 31.35 km), and Tekali (Ch. 39.0 km) [42]. 
In places near faults, rocks have low strength. Weathered zones ranging moderately to highly 
have been detected in the PSRWT tunnel project's fault zones. The PSRWT tunnel project is 
mainly underlain by Main Range Granite. 
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To make this study feasible, there is a need to prepare a database consisting of rock mass and 
material properties (model inputs) as well as TBM PR (model output). Therefore, field 
observations were conducted to collect rock mass properties in the tunnel. During the field 
observations for the rock mass properties, several parameters such as the rock quality designation 
(RQD), the rock mass rating (RMR), and weathering zone (WZ) were observed. In the following, 
these rock mass properties will be described in detail. 


RQD was introduced by Deere [43] to assess the quality of rock. RQD is a more sensitive 
assessment to appraise the quality of rock when compared to total core recovery. It is emphasized 
by many engineers that a higher value of RQD does not mean the strength of the rock is as high 
as well. Note that RQD only measures the quality of the rock itself. A total of 259 RQD values 
(259 panels) were collected, with an average of 34.11%, a minimum of 8.25%, and a maximum 
of 93.75%. Through all the sample data, it was quantified that the standard deviation of the 259 
data samples is 23.1%. RMR was developed by the South African Council of Scientific and 
Industrial Research by Bieniawski [44]. There are six influencing factors that govern the RMR, 
which include uniaxial compressive strength (UCS), RQD, joint spacing, groundwater condition, 
and joint orientation. A higher RMR suggests that the rock has very good characteristics. RMR 
can be used to determine the preliminary cohesion and angle of internal friction of a certain rock 
formation. An average RMR value of 58.50 was observed from the total set of 259 data. A 
minimum rating of 45 was observed from the datasets from the PSRWT tunnel project, while a 
maximum rating of 93 was also noticed. After performing statistical analysis, a standard 
deviation of 12.12 was observed for RMR values. Rock mass weathering starts at the 
discontinuity or joint of the surface of the rock formation itself, as the joint will smoothen the 
weathering process and subsequently produce different zones for the weathering class. 
Weathering takes place when the less stable minerals break down and the discoloration of rock 
penetrates deeply inside the rock. In the collected data from the PSRWT tunnel, there are three 
weathered zones, i.e., fresh, slightly weathered, and moderately weathered in the collected data 
from the PSRWT tunnel. In the database, to give them a value, the authors decided to use a value 
of one for fresh, two for slightly weathered, and three for moderately weathered zones. It is 
important to note that in each panel, which was typically 10 m, PR values were recorded. The 
minimum, maximum, and average of the TBM PR values were recorded as 1.72 m/h, 5.06 m/h, 
and 2.96 m/h, respectively. 


Together with field observation, laboratory tests were also carried out in this study to determine 
rock material properties. Two strength-based tests, i.c., UCS and Brazilian tensile strength 
(BTS), were conducted on the samples. An average value of 88.9 MPa was obtained for UCS 
tests, while 184 MPa and 40 MPa were recorded as maximum and minimum values, respectively. 
In addition, values of 7.23MPa, 4.69MPa, and 13.75MPa were obtained for the average, 
minimum, and maximum BTS results, respectively. To achieve the designed objective, the five 
parameters mentioned above, i.e., UCS, BTS, WZ, RMR, and RQD, were set as model inputs, 
and TBM PR was considered as a model output. A database with 259 data samples was prepared 
for the PR analysis. Figure 3 shows the linear correlation (Pearson’s r) between the pairwise 
variables. 


S.K. Eng et al./ Journal of Soft Computing in Civil Engineering 7-2 (2023) 138-154 145 


RQD (%) 


UCS (Mpa) 


-_ 2 %, 

B18] me 3..°S 

21124 . 

4 Boe! 

N 564 

1S) 

a | of Pearson's r=0.671 
8 ag? 

aah 

° 


RMR 


105 +} -——-_-————— 


= 
634 =e 
: ie 
< ° 


Pearson's r=0.777 Pearson's r=0.716 


y) fe et ee 


BTS (Mpa) 


8.11.74 x 
2 val asset 
Z 3091 fs 


Der Vy 59? 
ood Pearson's r=0.592 


jbo 
er ae 


Pearson's r=0.666 


eae % x 
ee : 


Pearson's r=0,579 


35 


thering Zone BT. 
—- NNR w 

Dp 

* 


ie 
& 


4 Pearson's r=-0.169 
| Sree cme mee cen 


ema eaeha__e 


Pearson's r=-0.002 
Sccmammecs @ 


Weathering Zone 


arm nmneen_o00 
Pearson's r=-0.054 
eemnen oem ences 


Pearson's r=0,009 
om coco om 


uh 


PR (mm/h) 


if Pearson's r=-0.737 
84% 


PR(m/h) Wea 
nN wh a 


Pearson's r=-0.800 
*s 


Pearson's r=-0.796 


— 


Pearson's r=-0.641 
| 


Pearson's r=-0.085 
° 


“7 


3.64 
44 
3 
2 


124. 
0 29 58 87 


RQD (%) 


+ r r r 
42 84 126 168 


UCS (Mpa) 


r r r 3 
S51 68 85 102 


RMR 


r r r 
6.2 9.3 12.4 15.5 


BTS (Mpa) 


r T r 7 
1.34 2.01 2.68 3.35 


Weathering Zone 


22 33 44 55 
PR (m/h) 


Fig. 3. The relationship between the pairwise variables. 


4. Methodology 


4.1. GMDH modeling 


The AI techniques in this study (GMDH and ANN) were built to forecast TBM PR. As a first 
step, the independent and dependent variables should be normalized. Normalization aims to 
derive the database into a common scale without adjusting the difference in the range of the 
original database. Equation (8) illustrates the used formula for normalization purposes: 


Y-Ymin 


(8) 


Y, y 
normalized — 
Ymax—Ymin 


where Ynormalized 18 the normalized data, Y is the original dataset, Y,,,;, is the minimum number in 
the dataset and Yinax is the dataset with the highest value. The next stage is to divide the whole 
database into two parts of training and testing for model construction and model assessment. 
Based on the literature, it was found that most of the researchers employed 80% of the datasets 
as training while the remaining 20% were allocated as testing data. Therefore, 80% of the entire 
dataset, including 207 data samples, were selected in the AI models to undergo training. The 
remaining 20%, including 52 data samples, were adopted to test the data. 


In the modeling of GMDH, it is necessary to first identify the influencing factors of this 
technique which are the number of neurons and the number of layers. According to results 
obtained from neuron numbers, it was found that a neuron number equal to 12 (12N) receives the 
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best performance prediction for estimating TBM PR. Therefore, the next step in modeling is to 
determine the number of layers through another parametric study. Tables 1 and 2 present the 
results of performance indices for training and testing data samples where different numbers of 
layers (L) were applied. A ranking system, which was introduced by Zorlu et al. [45], was 
applied to the results. According to this ranking system, the most accurate results will get the 
highest rank value in each category. The rank values of training and testing models are presented 
in the last column of Tables 1 and 2, respectively. A summation of the training and testing sets 
should be done to have a final rank value for these models. Therefore, final rank values were 
computed as 70, 61, 58, 40, 51, 46, 64, 62, and 39 for 2L-12N, 3L-12N, 4L-12N, 5L-12N, 6L- 
12N, 7L-12N, 8L-12N, 9L-12N, and 10L-12N models, respectively. It is abundantly clear that 
the number layer of 2 has the highest accuracy in predicting the PR based on the obtained results 
of both training and testing sets. Fig. 4 shows the target and output of PR with errors procured 
from both the training and testing stages for the best GMDH model. It is important to mention 
that GMDH results are presented based on the correlation of coefficient or R’ in Tables 1 and 2 
while they are presented based on R’ in Fig. 4. 


Table 1 
Performance indices and ranking system of GMDH layer for training dataset (12 neurons). 
Performance index Rank 
Model 
R RMSE | VAF (%) MAE a-20 R | RMSE | VAF | MAE | a-20 | Rank Value 
2L-12N | 0.9134 | 0.0775 | 80.5167 | 0.00344 | 0.70048 | 7 7 7 6 10 37 
3L-12N | 0.9203 | 0.0719 | 81.6970 | 0.00386 | 0.69565 | 8 9 9 3 9 38 
4L-12N | 0.9054 | 0.0792 | 77.5948 0.0018 | 0.66184 | 4 4 3 9 5 25 
5L-12N | 0.9044 | 0.0793 | 78.5915 | 0.00249 | 0.66667 | 3 3 5 8 6 25 
6L-12N | 0.8965 | 0.0803 | 75.2453 | 0.00387 | 0.64734 | 1 2 1 2 4 10 
TL-12N | 0.9214 | 0.0724 | 81.2065 | 0.00386 | 0.67633 | 9 8 8 4 7 36 
8L-12N | 0.9038 | 0.0811 | 78.9432 | 0.00373 | 0.68599 | 2 1 6 5 8 22 
9L-12N | 0.9090 | 0.0785 | 77.7428 | 0.00302 | 0.68599 | 6 6 4 7 8 31 
10L-12N | 0.9055 | 0.0786 | 77.4794 | 0.00387 | 0.66184 | 5 5 2 1 5 18 
Table 2 
Performance indices and ranking system of GMDH layer for testing dataset (12 neurons). 
Performance index Rank 
Model 
R RMSE | VAF (%) MAE a-20 R | RMSE | VAF | MAE | a-20 | Rank Value 

2L-12N | 0.9055 | 0.0767 | 73.7728 | 0.05776 | 0.69231 | 6 7 5 5 10 33 
3L-12N | 0.8854 | 0.0913 | 72.3109 | 0.06748 | 0.67308 | 4 3 4 3 9 23 
4L-12N | 0.9111 | 0.0793 | 77.6657 | 0.05605 | 0.63462 | 7 6 7 6 7 33 
5L-12N | 0.8781 | 0.0929 | 71.7538 | 0.07041 | 0.59615 | 2 2 3 2 6 15 
6L-12N | 0.9428 | 0.0754 82.39 0.05225 | 0.63462 | 9 8 8 9 7 41 
TL-12N | 0.8548 | 0.1027 | 52.5996 | 0.07894 | 0.59615 | 1 1 1 1 6 10 
8L-12N | 0.9215 | 0.0737 | 83.8994 | 0.05423 | 0.67308 | 8 9 9 a 9 42 
9L-12N | 0.9024 | 0.0799 77.473 0.05408 | 0.63462 | 5 5 6 8 7 31 
10L-12N | 0.8812 | 0.0913 | 69.7066 | 0.06404 | 0.65385 | 3 4 2 4 8 21 
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Fig. 4. Graph of predicted PR vs actual PR for training and testing datasets of the best GMDH model. 
4.2. ANN modeling 


To create an ANN model, the number of hidden layers, the number of nodes, and the transfer 
function must all be designed. In this study, the number of hidden layers was kept to a minimum 
to avoid overfitting the results. Overfitting is defined as the creation of an interpretation that is 
too similar or precise to a specific collection of data and therefore may fail to match additional 
data or accurately forecast future findings. As the number of hidden layers increases, the 
possibility of the occurrence of overfitting of data also increases. Therefore, we decided to only 
use a single hidden layer. On the other hand, to solve non-linear problems, the sigmoid transfer 
function achieved acceptable results according to previous studies [46]. Therefore, this transfer 
function was used for the ANN modeling part. For designing neuron numbers, several ANN 
models with different neuron numbers were constructed their results for training and testing parts 
are presented in Tables 3 and 4, respectively. Like GMDH modeling, rank values for each model 
and each section (i.e., training and testing) are shown in these tables. Then, the final rank values 
of 18, 37, 46, 55, 56, 63, 72, 59, and 56 were obtained for models with 1-9 neurons, respectively. 
The model, which has the highest accuracy in predicting TBM performance, is model 7N (with 7 
numbers neurons). Therefore, we introduced this model to forecast TBM performance. In Fig. 5, 
differences between measured and predicted PR values for the testing part of the selected ANN 
model are displayed. 


Table 3 
Performance indices and ranking system of the ANN models for the training part. 
Performance index Rank 
Model 
R RMSE | VAF (%) MAE a-20 R | RMSE | VAF | MAE | a-20 | Rank Value 

1N 0.9125 | 0.0776 | 79.7337 | 0.00373 | 0.66667 | 1 1 1 1 1 5 
2N 0.9219 | 0.0719 | 82.3433 | 0.00352 | 0.73913 | 2 2 2 2 3 11 
3N 0.9303 | 0.0687 | 84.4259 | 0.00271 | 0.74396 | 5 5 2) 7 2 24 
4N 0.9295 | 0.0705 | 84.1871 | 0.00345 | 0.74396 | 4 4 4 3 6 21 
5N 0.9280 | 0.0712 | 83.8243 0.0033 | 0.75845 | 3 3 3 4 4 17 
ON 0.9347 | 0.0652 | 85.4856 | 0.00294 | 0.78261 | 6 6 6 6 > 29 
7N 0.9473 | 0.0612 | 88.5332 | 0.00322 | 0.78744 | 8 8 8 5 8 37 
8N 0.9441 | 0.0631 | 87.7911 | 0.00249 | 0.83575 | 7 7 7 8 7 36 
9N 0.9581 | 0.0523 | 91.0399 | 0.00201 | 0.82126 | 9 9 9 9 9 45 
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Table 4 
Performance indices and ranking system of the ANN models for testing part. 
Performance index Rank 
Model 
R RMSE | VAF(%) | MAE a-20 R | RMSE | VAF | MAE | a-20 | Rank Value 
1N 0.9011 | 0.0788 | 72.1582 | 0.06095 | 0.67308 | 2 3 2 2 4 13 
2N 0.9178 | 0.0780 82.964 0.05672 0.75 5 4 6 3 8 26 
3N 0.9014 | 0.0829 | 78.0805 | 0.05271 | 0.69231 | 3 2 5 7 5 22 
4N 0.9267 | 0.0729 | 85.7628 | 0.05517 | 0.67308 | 7 6 9 5 7 34 
5N 0.9311 | 0.0634 | 84.7808 | 0.05046 | 0.76923 | 8 9 8 8 6 39 
6N 0.9335 | 0.0774 | 83.6527 | 0.05625 | 0.71154 | 9 5 7 4 9 34 
7N 0.9209 | 0.0682 | 77.6562 | 0.04925 | 0.73077 | 6 8 4 9 8 35 
8N 0.9120 | 0.0698 | 76.7197 | 0.05486 | 0.78846 | 4 7 3 6 3 23 
9N 0.8395 | 0.1159 | 57.9587 | 0.08355 | 0.65385 | 1 1 1 1 i 11 
Graph of Predicted data vs Actual data for training Graph of Predicted data vs Actual data for testing 
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Fig. 5. Graph of predicted PR vs actual PR for training and testing datasets of the best ANN model. 
5. Results and discussion 


In this study, to evaluate the effects of rock mass and material properties on TBM performance, 
two AI techniques i1.e., ANN and GMDH, were selected and applied. Then, the most important 
parameters in each technique were designed properly and used in a series of parametric studies. 
After that, a ranking system was used to show a good way of selecting the best model among all 
constructed models. The ranking system considers the effects of all four performance indices not 
only one or two of them. Table 5 shows the results of ANN and GMDH models using only four 
input parameters ranking systems of both train and test models for the best models developed for 
each statistical and AI method. Based on this table, it is copiously clear that ANN is the best 
model to predict the PR of TBM in comparison with GMDH. It is clear that ANN was able to 
achieve a higher accuracy level and lower system error in forecasting TBM performance. The R 
of the training dataset for an ANN model (i.e., with 1 hidden layer and 7 neurons) has a value of 
0.947, whereas, for testing, it has a value of 0.921. The value of R of ANN for both training and 
testing is higher than that of GMDH, which implies that ANN has a higher superiority. The 
RMSE obtained for the ANN model has a value of 0.061 and 0.068 for both train and test 


S.K. Eng et al./ Journal of Soft Computing in Civil Engineering 7-2 (2023) 138-154 149 


models, while the value of MAE obtained is 0.003 and 0.049 for both train and test models. 
These two performance indices showed that the error for the ANN model is the lowest. Lastly, 
the VAF obtained from ANN has a value of 88.533% and 77.656% for both training and testing, 
which can be concluded that ANN will have a higher superiority over the two models developed. 


In comparison, one research done by Eftekhari et al. (2010), the proposed ANN model had a 
coefficient of determination of 0.69 where the parameters considered were rock type, quartz 
content (q), UCS, BTS, RQD, RMR, Thrust, Torque, and Rs [47]. In another research, the 
developed ANN model had a R’ of 0.72. The independent variables considered were primarily 
rock mass and rock material property i.e., UCS, RQD, joint per volume (Jv) as well as joint 
spacing (Js). In another year, Armaghani et al. (2017) proposed an ANN model to predict both 
PR and AR of TBM by using rock mass and rock material properties as well as machine 
parameters. It was later illustrated that the ANN model has a coefficient of 0.666 and 0.706 [48]. 
In short, it is abundantly clear that the proposed ANN in this study has a higher accuracy in 
predicting TBM performance in comparison with the past model developed by other researchers. 


The fact that ANN has a higher performance capacity in this study is most probably due to the 
nature of the ANN technique. Occasionally, a single ANN model will yield a better result than a 
hybrid model. Sada and Ikpeseni [49] used both the ANN and neuro-fuzzy models to predict 
steel machine performance, and it was noted that ANN has a higher superiority over neuro-fuzzy 
in their study. It can be highlighted that it was expected that neuro-fuzzy has a higher degree of 
performance, as neuro-fuzzy is a hybrid model of both ANN and fuzzy, but in some studies 
[49,50], it can be proven that a single model is always much better than a hybrid model. In this 
study, GMDH is an improved model of ANN; nevertheless, ANN has a higher performance 
capacity than GMDH. 


The purpose of this study is to present a practical AI and ML technique that is fully based on 
rock material and mass properties for TBM PR prediction. The models proposed in this study 
were developed according to the results of four parameters, i.c., RMR, RQD, UCS, and BTS. 
These parameters can be easily determined during the site investigation phase using field 
observation and laboratory tests. Using the developed models, TBM PR can be estimated with a 
high accuracy level or low system error. The predicted values of TBM PR can be used to select 
the best features of the TBM machine for the excavation of the purposed tunnel. In addition, the 
scheduled plan for tunneling construction can be well-organized based on the obtained results 
from the developed models. 


The limitations of the present work are the specific geological background and the low diversity 
of the dataset. The present study was carried out on the PSRWT project, in which the geological 
formation is granite. Hence, the model is only applicable for future TBM tunneling projects that 
are excavating through the granite region. On the other hand, the collected dataset only included 
information on rock mass and rock material properties, which limits the versatility of the model. 
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Table 5 
The results of best models to predict TBM performance. 
Category Model R’ RMSE | VAF(%) | MAE | a-20 
Train ANN (1 hidden layer-7 Neurons) 0.897 0.061 88.533 0.003 0.787 
GMDH (2 Layer-12 Neurons) 0.843 0.077 80.517 0.003 0.700 
Test ANN (1 hidden layer-7 Neurons) 0.848 0.068 77.656 0.049 0.731 
GMDH (2 Layer-12 Neurons) 0.820 0.077 73.773 0.058 0.692 


6. Recommendation for future work 


In this research, the inputs considered are primarily rock mass and rock material properties, 1.e., 
UCS, BTS, RMR, and RQD. It is recommended that future studies should consider the effects of 
other rock masses and material parameters. In addition, a wider range of the same parameters can 
be prepared to propose a ML model with a high level of generalization. To do this, results from 
various case studies can be gathered to establish a more comprehensive database for TBM 
performance prediction. It is advised to utilize hybrid ML/AI models to predict TBM 
performance. Many researchers [51—53] have stated that hybrid models can provide a better 
relationship in predicting TBM performance. For instance, a hybrid model integrating a support 
vector machine and some optimization algorithms, e.g., gray wolf optimization, moth flame 
optimization, and whale optimization algorithms, would provide better accuracy in terms of 
results in comparison with single models. The authors would also like to mention that ML/AI 
models should be moved to those more focused on physics relationships between inputs and 
output parameters. This would convert a pure ML model to a theory-based or physics-based ML 
model, which is more applicable in civil engineering, especially in the geomechanics and 
geotechnics fields. 


7. Conclusions 


In this study, to achieve a higher performance prediction of the PR of TBM which is the ultimate 
aim of prediction models, two ML/AI methods 1.e., ANN and GMDH were proposed using four 
independent rock mass and material properties i.e., UCS, BTS, RQD, and RMR. The models 
were established on a dataset that was based on the PSRWT tunnel project in Malaysia. Several 
commonly used assessment metrics were used to examine the performance of each model. The 
results show that GMDH with two layers and 12 neurons is the best model to predict the PR with 
these four types of input. During the modeling in ANN, we found that the model with seven 
neurons is the best to forecast TBM PR. A ranking model was developed to compare the 
statistical model and these two AI methods. At the denouement, it was observed that ANN has 
the highest prediction performance, followed by GMDH. It was shown that a single ANN model 
occasionally performed much better than an improved model depending on the nature of the 
dataset. The developed ANN can be used in engineering cases that are with similar geological 
types to the current study. 
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